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Algorithm  Development  for  SDI  Weapons  System  Allocation 


Abstract: 


While  several  SDI  weapons  systems  can  provide  adequate  defense  in  a  one- 
on-one  basis,  a  coordinated  attack  by  several  enemy  missiles  launched  over  a 
substantial  volume  will  be  difficult  to  resist  without  an  efficient  command  and 
control  system  for  warfare  coordination.  Our  study  of  weapons  allocation  - 
coordination  algorithms,  is  based  on  dynamical  models  for  the  missile/decoy 
systems  including  noise  effects  and  uncertainties  in  the  model  parameters. 
Performance  of  the  weapons  targeting  system  may  be  measured  in  terms 
of  the  expected  number  of  targets  eliminated  in  a  given  interval  (phase  of 
operations)  or  the  expected  time  to  eliminate  all  the  targets  in  a  given  re¬ 
gion.  Scheduling  weapons  deployment  is  a  problem  of  constrained  optimal 
stochastic  scheduling  and  resource  allocation  for  a  system  with  many  controls 
(weapons)  and  state  variables.  The  selection  of  weapons  deployment  tactics 
is  based  on  solution  of  a  complex  optimization  problem.  We  have  conducted 
an  investigation  of  advanced  modeling,  stochastic  control,  and  scheduling 
methodologies  for  aspects  of  the  SDI  weapons  allocation  problem  -  several 
platforms  with  assets  of  different  character  defending  againest  a  diverse  col¬ 
lection  of  targets.  The  models  for  such  scenarios  lead  to  stochastic  scheduling 
problems  which  can  not  be  handled  by  conventional  analytical  methods.  We 
describe  several  different  analytical  approaches  which  have  the  potential  for 
synthesis  of  effective  engagement  algorithms. 
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Executive  Summary: 


We  have  conducted  an  investigation  of  advanced  modeling,  stochastic  con¬ 
trol,  and  scheduling  methodologies  foT  aspects  of  the  SDI  weapons  alloca¬ 
tion  problem  -  several  platforms  with  assets  of  different  character  defending 
againest  a  diverse  collection  of  targets.  The  models  for  such  scenarios  lead 
to  stochastic  scheduling  problems  which  can  not  be  handled  by  conventional 
analytical  methods.  We  discuss  several  different  analytical  approaches  which 
have  the  potential  for  synthesis  of  effective  engagement  algorithms. 

Key  Words:  Weapons  allocation,  stochastic  sequencing  and  scheduling, 
index  rules. 
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1  Identification  and  Significance  of  the  Prob¬ 
lem 

While  several  SDI  weapons  systems  can  provide  adequate  defense  in  a  one- 
on-one  basis,  a  coordinated  attack  by  several  enemy  missiles  launched  over 
a  substantial  volume2  will  be  difficult  to  resist  without  an  efficient  command 
and  control  system  for  warfare  coordination.  Battle  Management  systems 
for  a  region  or  the  weapons  allocation  systems  for  individual  stations  in  the 
region,  require  automated  decision-making  systems  to  rapidly  evaluate  the 
alternative  actions  and  select  deployment  schemes  compatible  with  tacti¬ 
cal  and  strategic  doctrines.  A  starting  point  for  the  development  of  such 
a  system  is  the  analysis  of  the  coordination  of  various  spatially-separated 
platforms/stations,  each  with  multiple  capabilities,  to  defend  against  several 
threats  attacking  simultaneously. 

Coordination  and  contextual  information  is  essential  in  tactical  weapons 
deployment.  The  use  of  certain  systems  increases  the  visibility  of  the  platform 
to  a  degree  determined  in  part  by  its  current  position  and  attitude.  Expend¬ 
able  weapons  and  countermeasures  must  be  “rationed”  over  the  course  of  an 
engagement.  The  interaction  of  weapons  and  countermeasures  can  be  con¬ 
structive  or  it  can  hinder  performance,  depending  on  use  and  the  operational 
context.  Both  asynchronous  and  synchronous  operating  policies  for  resources 
can  be  useful  in  a  given  situation. 

'In  space  and  time. 


1 


The  weapons  platforms  and  the  targets  will  undergo  significant  move¬ 
ments  in  their  orbital  paths  during  the  (20  minute)  duration  of  the  mid¬ 
course  phase.3  Therefore,  it  is  necessary  to  use  dynamical  models  to  describe 
engagements  during  this  phase.  Since  there  may  be  significant  uncertainty 
in  the  measurements  of  target  (and  decoy)  trajectories  and  profiles,  it  is 
necessary  to  use  models  which  account  for  this  uncertainty. 

Accordingly,  our  study  of  weapons  allocation  algorithms,  is  based  on  dy¬ 
namical  models  for  the  missile/decoy  systems  including  noise  effects  and 
uncertainties  in  the  model  parameters.  Performance  of  the  weapons  target¬ 
ing  system  may  be  measured  in  terms  of  the  expected  number  of  targets 
eliminated  in  a  given  interval  (phase  of  operations)  or  the  expected  time  to 
eliminate  all  the  targets  in  a  given  region.  Other  performance  measures  are 
possible. 

In  this  framework  scheduling  weapons  deployment  becomes  a  problem 
of  constrained  optimal  stochastic  scheduling  and  resource  allocation  for  a 
system  with  many  controls  (weapons)  and  state  variables.  The  selection  of 
weapons  deployment  tactics  is  based  on  solution  of  a  complex  optimization 
problem.  The  computational  complexity  of  this  problem  (number  of  variables 
which  must  be  computed)  grows  (at  least)  exponentially  with  the  number  of 
state  variables  in  the  system.  Since  this  is  a  function  of  the  number  of  targets, 
the  computational  problem  is  intractable  in  target  rich  environments. 

Thus,  the  complexity  of  the  SDI  weapons  allocation  problem  requires 
3  In  this  project  we  shall  focus  attention  on  this  phase. 
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the  use  of  algorithms  incorporating  not  only  advanced  numerical  techniques 
but  also  heuristic  procedures  and  efficient  knowledge  representation  methods 
to  achieve  performance  levels  approaching  the  announced  SDI  operational 
requirements.  Within  the  limited  setting  of  this  Phase  I  project  we  have 
evaluated  such  techniques  in  the  context  of  a  systematic  class  of  analytical 
models  for  management  of  engagements  under  uncertainty. 


2  Problem  Description 

We  consider  stochastic  control  and  scheduling  formulations  for  certain  as¬ 
pects  of  the  SDI  weapons  allocation  problem  -  several  platforms  with  assets 
of  different  character  defending  against  a  diverse  collection  of  targets.  The 
models  for  such  scenarios  lead  to  stochastic  scheduling  problems  which  can 
not  be  handled  by  conventional  analytical  methods.  We  discuss  several  differ¬ 
ent  analytical  approaches  which  have  the  potential  for  synthesis  of  effective 
engagement  algorithms. 

Since  the  weapons  platforms  are  spatially  distributed  and  mobile,  dis¬ 
tributed  processing,  communications,  and  decision-making  capabilities  en¬ 
hance  the  reliability  and  survivability  of  the  BM  weapons  C2  system.  While 
our  primary  effort  has  focused  on  the  management  of  a  single  platform  with 
multiple  resources,  we  have  also  examined  models  for  a  multiple  platform  sys¬ 
tem  with  mobile  command  and  operational  units.  We  use  stochastic  schedul¬ 
ing  methodologies  to  optimize  the  performance  of  each  platform. 
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2.1  Distributed  Layered  Structure  for  Weapons  Sys¬ 
tem  Management 

In  an  engagement  scenario,  when  several  platforms  deploy  a  variety  of  weapons 
against  several  threats,  conflicts  and  constraints  arise.  Timing  or  precedence 
contraints  are  especially  important.  Erroneous  threat  type  identification  may 
substantially  reduce  reaction  time  and  choice  of  weapons  system  response. 

These  observations  suggest  a  layered  control  structure  for  weapons  system 
coordination.  At  the  lower  level,  each  individual  station  requires  a  resource 
allocation  algorithm  capable  of  operating  in  a  random  environment  in  the 
presence  of  time-precedence  constraints  [20,  21,  30,  53].  There  are  several 
methodologies  for  such  problems,  including  some  promising  recent  develop¬ 
ments  [3,  2,  35,  36,  37,  41,  49,  50],  At  the  higher  level,  when  several  stations 
are  involved,  inaccuracies  in  threat  identification  may  be  anticipated  and 
significant  communication  requirements  arise.  For  example,  careful  timing  is 
necessary  to  successfully  “hand”  a  target  from  one  station  to  another.  Dif¬ 
ficult  questions  regarding  synchronous  or  asynchronous  operation  also  arise 
when  deployment  occurs  under  '■hanging  network  topology  and  variable  local 
data  bases. 

In  this  report  we  focus  on  the  activities  of  the  lowest  operational  level 
in  the  hierarchy;  however,  the  analytical  tools  used  and  developed  are  suffi¬ 
ciently  general  that  they  can  be  brought  to  bear  on  many  aspects  of  other 
operational  problems  hierarchy.  The  abstract  scheduling  methodology  is  es- 
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pecialiy  germaine. 


2.2  Algorithm  Development  for  Weapons  System  Al¬ 
location 

The  multiple  station  area  weapons  allocation  coordination  problem  io  a  ver¬ 
sion  of  the  multi-server  scheduling  problem.  Since  few  optimal  algorithms 
are  known  for  this  class  of  problems,  we  have  examined  a  class  of  suboptimal 
strategies  based  on  the  distributed,  hierarchical  structure  of  the  system.  In 
this  setup,  each  station  is  controlled  by  an  “agent"’  which  may  be  a  computer. 
The  agent  executes  a  “local”  control  strategy  to  deploy  weapons  assets  to 
engage  threats  in  his  area.  Constraints  on  the  deployment  of  weapons  by 
neighboring  agents  assure  that  interference  does  not  arise.  Since  a  system 
with  a  single  “command  center’’  is  not  survivable,  we  assume  that  there  are 
several  BM  systems  as  described  above;  and  that  they  share  a  common  data 
base.  Since  the  areas  of  influence  of  different  platforms  may  overlap,  and 
since  threats  may  pass  through  the  areas  controlled  by  several  platforms, 
coordination  of  the  weapons  allocation  is  essential. 

The  agent's  decision  problem  is  to  effectively  engage  threats  in  his  area, 
his  performance  measure  is  a  “reward”  for  successful  engagement  of  a  threat 
and  a  “penalty”  for  threats  not  engaged.  The  BM  systems  decision  problem 
is  to  see  that  threats  are  (continuously)  successfully  engaged  as  they  pass 
through  the  total  area  of  influence  of  the  agents  under  its  command.  Its 


5 


performance  measure  also  includes  a  penalty  for  threats  not  engaged,  and 
possibly  penalties  for  revealing  the  position  of  friendly  units.  The  information 
transmitted  from  agents  to  the  BM  system  and  vice  versa  will  be  summary 
status  information. 

The  stochastic  scheduling  model  we  use  to  represent,  the  decision  problem 
modeled  in  this  way  may  be  solved  by  invoking  strategies  based  on  a  priority 
index  rule.  The  index  is  a  scalar  quantity  associated  with  each  weapons  sys¬ 
tem.  (Indicies  may  also  be  associated  with  the  targets.)  Its  numerical  values 
depend  on  the  state  of  the  weapon  system,  the  threat  data,  and  operational 
constraints  imposed  on  the  systems  actions.  To  solve  his  “local  scheduling 
problem,”  the  agent  computes  the  vector  of  indices  for  his  resources  (and  in 
some  cases  indicies  for  the  targets)  and  implements  the  resource  with  the 
largest  index  (or  attacks  the  target  with  the  largest  index). 

Thus,  the  state  of  the  staTon’s  weapon  system  is  described  by  a  vector 
of  priority  indices.  This  is  the  (summary)  information  the  platform  com¬ 
municates  to  superiors  and  other  agents.  In  the  hierarchical  structure  BM 
commanders  can  effectively  direct  the  actions  of  station/platform  agents  by 
imposing  constraints  and  performance  bounds.  The  latter  are  essentially 
Lagrange  multipliers,  sometimes  called  “coordination  variables,”  in  system 
theory.  These  variables  may  be  updated  less  frequently  than  the  natural 
frequency  of  agent  actions. 
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2.3  Data  Requirements  and  Implementation  Issues 


The  data  base  requirements  for  this  system  include: 

•  weapons  platform  states; 

•  data  on  threats;  operational  data  on  other  components;  and 

•  communication  network  status. 

The  data  base  is  distributed  throughout  the  BM  system.  An  agent’s  access 
to  the  data  is  limited  primarily  by  its  communications  and  processing  abil¬ 
ities.  Distributed  communications  and  processing  facilities  will  be  required 
in  the  weapons  allocations  subsystem  to  achieve  the  operational  flexibility, 
effectiveness,  and  survivability  mandated  by  the  SDI  program. 

The  complexity  of  the  area  weapons  system  C2  problem  and  the  large 
number  of  state  variables  involved  prohibit  the  computation  of  exact  “op¬ 
timal”  command  and  control  strategies  for  each  operational  state,  network 
configuration,  and  threat  scenario  Effective  (suboptimal)  coordination  of  the 
weapons  allocation  system  requires  more  than  efficient  computational  algo¬ 
rithms;  it  requires  a  logical  support  structure,  to  delineate  command  options, 
likely  interference  effects,  etc.  to  the  BM  station.  Its  primary  function  would 
be  to  k<"  •  *  :ack  of  precedence  constraints  effecting  the  deployment  tactics  of 

neiglo  •*,  >  v  platforms,  to  guide  the  procedure  of  “handing  a  threat”  from  one 
platform  t-  >'  ;  neighbor  to  assure  continuous  engagement  of  the  threat’s  sys¬ 
tem,  and  to  manage  interplatform  and  intersystem  communications.  During 
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this  phase,  we  have  not  undertaken  the  development  of  a  logic  programming 
capability  for  this  purpose  as  part  of  the  allocation  algorithm. 

We  shall  use  the  framework  of  cooperative  team  theoretic  solutions  to 
■‘large  scale’’  scheduling  (allocation)  problems  as  the  analytical  basis  for  the 
development  of  efficient  algorithms.  As  we  shall  argue  below,  this  framework 
provides  a  systematic  basis  for  the  construction  of  “suboptimal”  but  satisfac¬ 
tory  tactics  for  weapon  allocation.  It  also  provides  analytical  procedures  for 
the  evaluation  of  performance,  degrees  of  “optimality,"  and  the  evaluation  of 
satisfactory  solutions. 


2.4  Summary 

We  use  stochastic  dynamical  models  to  represent  the  interaction  of  the  weapons 
platforms  and  the  target  systems  during  the  post-boost  and  mid-course 
phases  of  operations  included  in  this  project.  Our  computational  algorithms 
treat  these  models  by  various  discretization  procedures  that  ultimately  re¬ 
ducing  them  to  discrete  time  Markov  chain  systems  [33]. 4 

Based  on  the  engagement  models,  we  use  a  two  step  approach  to  the 
development  of  weapons  allocation  algorithms.  First,  we  have  developed 
a  prototype  set  of  algorithms  using  the  stochastic  gradient  method.  This 
method  has  proven  effective  in  the  treatment  of  large  scale  network  planning 

■* Further  work  is  recommended  to  enhance  the  models  to  include  more  realistic  trajec¬ 
tory  and  orbital  dynamics  and  better  noise  representation. 
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problems  involving  a  number  of  state  variables  comparable  to  the  dimensions 
one  might  expect  for  subsystems  of  the  weapons  -  target  engagement  system. 
These  algorithms  are  described  below. 

The  algorithms  computed  by  this  method  may  be  used  as  a  baseline  for 
the  development  of  more  representative  strategies  which  reflect  the  opera¬ 
tional  structure  of  the  SDI  BM  weapons  allocation  system.  We  have  used 
stochastic  scheduling  models  and  index  rules  to  derive  dynamic  engagement 
tactics  for  a  BM  system  involving  several  weapons  platforms  responding  to 
a  large  number  of  targets  over  an  extended  region  of  space.  This  class  of 
analytical  procedures  and  the  format  of  the  resulting  algorithms  is  described 
below. 

As  we  show,  the  technical  problem  of  “coordinating”  the  actions  of  several 
complex  weapons  platforms  is  highly  nontrivial.  Conventional  optimization 
procedures  will  not  be  effective  -  there  is,  m  fact,  no  theory  to  support 
such  a  development.  For  “practical  purposes,”  it  is  therefore  appropriate  to 
supplement  the  algorithms  with  a  a  “logical  support  system”5  using  certain 
AI  techniques  for  “constraint  directed  scheduling.” 

^E.g.,  a  “real  time  expert  system.” 
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3  Analytical  Models 


3.1  An  Abstract  Model  for  Engagement  Dynamics 


We  will  discuss  an  abstract  mathematical  version  of  this  model  to  illustrate 
how  the  modeling  problems  and  the  design  of  “practical”  algorithms  based 
on  such  models  may  be  developed.  There  are  three  important  points  which 
we  wish  to  stress  before  describing  the  analysis: 

First,  it  is  not  possible  to  compute  “optimal”  control  laws  for  the  weapons 
platforms  in  a  realistic  model  of  the  weapons  -  target  interaction.  The  com¬ 
putational  burden  grows  unavoidably,  exponentially  with  the  number  of  state 
variables.  Since  each  target  (and  decoy)  will  have  a  minimum  of  six  state 
variables,  and  there  may  be  thousands  of  targets  and  decoys,  this  is  an  insur¬ 
mountable  problem  which  cannot  be  solved  using  any  conceivable  computer 
technology. 

Second,  contrary  to  what  one  might  expect,  the  computational  problems 
associated  with  “local  control”  strategies  are  worse  than  those  for  the  evalua¬ 
tion  of  global  optimal  strategies.  That  is,  the  problem  of  computing  feedback 
controls  (engagement  tactics)  which  have  been  partitioned  to  respond  to  a 
subset  of  the  states  of  the  target  population  is  more  demanding  than  the 
optimal  control/allocation  problem  for  the  system  taken  as  a  whole. 

Third,  if,  however,  the  system  dynamics  have  some  special  structural 
properties,  or  more  precisely  if  one  can  approximate  the  target  system  model 
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by  one  with  special  properties,  then  it  is  possible  to  derive  computationally 
feasible  procedures  for  the  evaluation  of  control/allocation  strategies.  Two 
such  properties  are  uncoupled  dynamics  (target  to  target)  and  the  case  when 
the  “equilibrium”  probability  distribution  of  the  target  states  has  a  product 
form.  The  first  case  applies  to  targets  which  are  individual  missiles  released 
from  distinct  launching  sites.  The  second  case  may  be  suitable  as  a  descrip¬ 
tion  of  the  distribution  of  a  family  of  decoys  released  by  a  single  missile  (i.e., 
the  initial  dependency  would  diminish  as  the  decoy  constellation  assumes  its 
ultimate  distribution). 

To  appreciate  these  points,  consider  the  abstract  stochastic  control  prob¬ 
lem 

X* 

V(s,  x )  =  min{£[  f  e~Xlc(x(t),u(t))dt\x(0)  =  x}} 

u£U  Js 

x  =  b(x(t),u(t))  +  w(t)  (1) 

x(t)  €  X  C  RN,  u(t)  e  U  C  RM 

As  a  model  for  a  specific  SDI  operational  phase,  the  composite  state  vec¬ 
tor  x{t)  contains  the  state  vectors  of  the  various  targets,  the  weapons  plat¬ 
forms  (position,  velocity,  attitude,  etc.),  and  states  for  any  auxiliary  processes 
(noise)  which  may  be  needed  to  complete  specification  of  the  dynamics  of  all 
the  interacting  systems.  The  control  vector  u(t )  represents  the  parameters  of 
the  weapons  systems  involved  in  the  interaction  which  may  be  manipulated 
to  direct  the  weapons  to  attack  selected  targets,  e.g.,  platform  alignment. 
The  vector  w(t)  is  a  noise  process,  nominally  a  “white  noise.”  The  function  b 
describes  the  dynamics  of  the  various  systems,  including  the  orbital  motions 
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of  the  weapons  platforms  and  the  trajectories  of  the  targets  and  decoys  in 
the  gravitational  field.  The  parameter  A  is  a  “discount  factor.”  The  con¬ 
straint  sets  X  and  U  reflect  physical  or  operational  constraints.  (There  may 
be  many  other  types  of  constraints.  Our  main  concern  at  this  point  is  an 
assessment  of  the  computational  problem,  not  the  precise  model  structure.) 


3.1.1  Computational  Complexity 

The  optimal  cost  V'(s,  x)  for  the  abstract  problem  (1)  is  found  by  solving  the 
Hamilton  Jacobi  equation  of  dynamic  programming: 


.uin{6(x,  u)  •  W  +  c(x,u)  +  AV  -  AV}  =  —  (2) 

u^c'  as 

which  also  gives  the  optimal  strategy  for  problem  (1)  in  feedback  form. 

The  numerical  solution  of  system  (1)  is  virtually  impossible  when  the 
number  N  of  state  variables  is  large.  The  problem  is  not  simply  one  of  nu¬ 
merical  analysis,  but  an  irreducible  difficulty.  No  matter  what  (numerical) 
approximation  method  is  used,  achieving  a  given  level  of  precision  as  the 
dimension  of  the  state  space  increases  will  require  an  exponentially  increas¬ 
ing  computational  cost.  This  is  not  even  a  consequence  of  the  optimization 
formulation  -  the  associated  linear  eigenvalue  problem  has  the  same  compu¬ 
tational  complexity.  In  effect,  the  computational  problem  is  “NP-complete.” 

It  is  natural  to  attempt  to  avoid  this  problem  by  assigning  individual 
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controllers  (weapons  platforms)  to  a  portion  of  the  state  space,  and  to  select 
control  laws  in  an  optimal  fashion  to  deal  with  just  that  subset  of  the  state 
space.  This  is  optimization  in  the  “class  of  local  feedbacks.”  Unfortunately, 
in  the  absence  of  certain  structural  conditions  on  the  system  dynamics,  this 
problem  is  more  demanding  computationally  than  the  previous  one. 

To  see  that  this  is  case,  let  /  be  an  index  set  for  the  subsystems,  /  = 
{1,2,  ...,&}  and  let  and  m,  denote  the  number  of  states  and  controls 
respectively  in  subsystem  i  €  I-  A  local  feedback  is  a  mapping  Si  from 
[s,T\  x  Rn'  into  Ui  C  Rm' ,  the  set  of  admissible  values  of  the  control  for 
subsystem  i.  Let  Si  =  {5  =  (5i, .  .  . ,  be  the  class  of  local  feedbacks. 
If  Us  is  a  local  feedback  control  and  xs  is  the  corresponding  solution,  then 
optimization  in  the  class  Si  of  local  feedbacks  is  the  problem 

V'(5,x)  =  min {Es[f  eMc(xs{t),us{t))dt\xs{0)  =  z]} 

J 3 

is  =  b(xs(t),us(t))  +  w(t)  (3) 

xt(t)  €  Xi  C  Rn' ,  Ui(t)£UiCRm' 

If  we  let  ps(t,x )  be  the  probability  density  of  is(0  corresponding  to  local 
feedback  S  €  Sl,  then  a  given  strategy  R  €  Si  may  be  improved  by  the 
algorithm 

Step  1:  Compute  pR 
Step  2:  Solve 

f  CsVs  +  cs  =  0,  Vs{Tr)  =  0 

S  n  c 

I  5  G  Arg  {min^  {H(t,  Z,pR,  U5)},  S  G  SL  } 
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Here  c°  -  c(x,  S(x)),  5  £  SL,  H(t ,  R,  p,  V)  is  the  Hamiltonian  of  prob¬ 
lem  (3),  and  pR  is  the  solution  of 


CRpR  =  0,  /(0,  •)  =  p 


r  _  ®  |  lS  &  |  r  d2 

Cs  +  dxJ+f'ai]dxldxJ 


(5) 

(6) 


j  J  'j 

where  p  is  the  initial  density  of  the  state  x(0),  {atJ}  is  the  covariance 
matrix  of  the  noise,  and  bs(t ,  x)  =  b(t,  x,  S(t,  x)). 


A  fixed  point  of  this  algorithm  satisfies  the  conditions  of  optimality  for  the 
problem  (3).  However,  it  is  clear  that  the  algorithm  (4)  requires  more  com¬ 
putations  than  simple  dynamic  programming  in  equation  (2). 


3.1.2  Systems  with  Decoupled  Dynamics 

If  the  underlying  dynamics  of  the  target  system  are  decoupled,  and  if  the  con¬ 
trols  (weapons)  respond  only  to  certain  subsets  of  the  state  space,  then  team 
theoretic  (local  control)  strategies  can  be  computed.  The  SDI  engagement 
scenario  has  this  structure. 

In  this  case  we  must  have  b(t,  x,u)  =  [6i , . . . ,  6*,]  with 
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bi(t,x,  u)  :  (0,oo)  x  Rn'  x  Ui  -+  Rn'  (7) 

and  we  assume  that  the  noises  are  not  coupled  between  subsystems.  Then, 
for  each  local  control  strategy  R  =  [iZj]  £  Sl ,  the  probability  density  of  the 
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state  x  satisfies  pR  =  n  ie/pf’.  and  it  may  be  computed  from  equation  (5) 
with  the  expression  (7)  substituted  into  the  operator  (6).  Let  CitRt  be  the 
operator  with  the  substitutions.  The  controls  are  still  chosen  through  the 
combined  performance  index  in  problem  (1).  The  functionals 

cf  =  [  c(t>*,R{x))]]LP?3[t,xj)dxj,  i  e  I  (8) 

i*i 

are  the  conditional  expectations  of  the  instantaneous  cost  based  only  on 
knowledge  at  the  ith  subsystem.  Using  these  functions,  a  sufficient  condition 
for  a  strategy  S  to  be  optimal  agent  by  agent  is 

min  [Ayr.V;  +  cf]  =  0,  i  6  I  (9) 

Rt 

The  corresponding  optimal  cost  is  pi(Vi)  =  —  ^(V*)  with 

Mvi)  =  fRn  (J’i(dxi)Vi(s,xi)  (10) 

where  /z,  is  the  initial  probability  distribution  of  the  subsystem  state  X{. 

This  result  provides  an  algorithm  for  the  computation  of  a  strategy  opti¬ 
mal  agent  by  agent. 

Given  e,  v  £  (0,  oo)  : 

Step  1:  Choose  i  €  I,  solve  equation  (9).  If  pi(Vi)  <  u  —  e,  then  set  u  := 
Hi(Vi)  and 

Ri  Arg mmlCt'R'Vi  +  cf]  (11) 

Rk 

If  not,  then  choose  another  i  E  I  until  pi(Vt)  >  u  —  e,Vz  £  I. 
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Step  2:  When  /x,(Vj)  >  v  —  e,Vi  G  /,  then  set  e  :=  e/2,  and  go  to  Step  1. 


This  algorithm  produces  a  decreasing  sequence  iAn)  which  converges  to 
a  cost  which  is  optimal  agent  by  agent.  A  proof  of  convergence  for  the 
discrete  version  of  this  algorithm  is  given  in  [44],  (Even  at  this  level  of 
model  simplification,  it  may  be  difficult  to  solve  the  problem  in  Step  1.  We 
shall  discuss  two  procedures  for  reducing  this  problem  further  in  subsequent 
sections. ) 

The  analysis  in  [44]  establishes  the  same  sequence  of  arguments  -  that  is,  a 
procedure  for  reducing  the  computational  requirements  of  optimal  stochastic 
control  problems  -  for  Markov  chain  -  queuing  system  models  of  controlled 
discrete  time  systems.  Such  models  may  be  useful  in  developing  high  level 
strategies  for  SDI  interception  operations.  For  example,  simple  birth  and 
death  type  processes  may  be  useful  in  describing  the  transition  of  targets 
through  a  region,  particularly  in  cases  where  bursts  ci  decoys  are  generated 
by  a  hard  target  during  the  transition. 

These  models,  which  do  not  account  for  the  trajectories  of  the  targets  and 
decoys,  may  be  useful  in  deciding  the  commitment  levels  of  weapons  within 
the  region  and  in  neighboring  regions.  Since  they  may  be  resolved  by  effi¬ 
cient  “index  rules”  and  “stochastic  gradient”  methods,  they  support  “rapid 
prototyping  development”  of  computational  algorithms.  In  the  next  three 
sections  we  describe  the  development  of  efficient  computational  algorithms 
for  models  with  special  structures  compatible  with  the  mid-course  phase  of 
SDI  operations  framework. 
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3.1.3  Systems  with  Product  Form  Performance  Measures 

The  assumption  that  the  dynamics  of  the  system  be  completely  decoupled 
is  unrealistic  in  most  SDI  operations  scenarios.  In  the  first  instance  enemy 
missiles  will  likely  be  launched  in  volleys,  and  groups  launched  from  the  same 
geographical  site  may  travel  together  on  route  to  a  designated  target.  Alter¬ 
nately.  targets  may  release  a  family  of  associated  decoys  during  the  course  of 
a  flight.  The  distributions  (of  the  state  vectors)  of  the  decoys  and  the  parent 
target  will  be  dependent,  at  least  for  the  initial  portions  of  their  flights.  If, 
however,  the  distributions  can  be  well  approximated  by  independent  distri¬ 
butions  in  the  limit  of  large  times,  that  is,  as  the  distribution  of  decoys  about 
the  main  target(s)  stabilizes,  then  it  is  possible  to  compute  agent  by  agent 
optimal  engagement  strategies  using  the  second  algorithm  discussed  above. 

We  shall  omit  most  of  the  technical  details,  noting  only  the  main  points. 
First,  it  is  necessary  to  have  the  probability  distributions  of  the  states  of  the 
target  systems  converge  in  the  limit  of  “long  times”  to  ergodic  distributions 
with  the  product  form 

P(x)  =  cn  (12) 

t=i 

with  c  a  normalization  constant.  By  “long  time”  we  mean  times  long  relative 
to  the  time  constants  of  the  release  process  for  decoys,  for  instance.  If  the 
release  and  “blooming”  of  the  decoy  configuration  take  place  over  a  matter  of 
a  few  minutes,  then  the  total  time  of  the  mid-course  phase  may  be  considered 
long  relative  to  the  initial  period  of  development.  Second,  the  control  problem 
corresponding  to  this  situation  is  either  the  system  (1)  with  T  =  c©  or  the 
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“ergodic  control  problem’’  with  the  average  cost 

min  lim  f  c(x(t),  S(x(t))  dt.  (13) 

•>L  T— oo  l  J0 

This  class  of  diffusion  process  models  may  be  regarded  as  the  natural  limits 
of  Jackson  networks  of  queues  [29]  under  the  scaling 

x  ~  1  ~  Jp  &s  N  oo  (14) 

As  a  description  of  an  engagement  between  weapons  and  a  target  system, 
the  queues  correspond  to  the  targets  exposed  to  a  given  weapon  system  (the 
“server”).  The  output  rate  corresponds  to  the  rate  at  which  targets  are 
passed  on  to  neighboring  weapons  systems.  The  transition  probabilities  mis¬ 
describe  the  likelihood  of  a  target  passing  through  weapons  region  i  and 
entering  weapons  region  j. 

Efficient  algorithms  for  the  solution  of  control  problems  for  networks  of 
Jackson  type  were  given  in  [44].  They  have  been  applied  to  large  scale  sys¬ 
tems  including  numbers  of  state  variables  (several  hundred)  that  might  be 
reasonably  associated  with  subsystems  of  an  SDI  post-boost  or  mid-course 
engagement  system.  We  shall  describe  algorithms  for  optimization  of  models 
of  this  type  based  on  index  rules  in  a  subsequent  section. 

There  are  two  ways  in  which  the  queuing  network  model  may  be  asso¬ 
ciated  with  a  differential  equation  model  like  the  abstract  system  in  prob¬ 
lem  (1).  First,  the  queuing  model  can  serve  to  provide  a  difference  equation 
numerical  approximation  of  the  solution  of  the  continuous  time  control  prob¬ 
lem.  This  is  the  approach  developed  in  [33]  which  has  become  a  standard 
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technique  in  the  solution  of  control  and  scheduling  problems.  This  method 
preserves  the  information  on  the  trajectory  and  orbital  dynamics  contained  in 
the  differential  equations.  As  we  have  argued,  this  information  is  important 
during;  the  long  period  of  the  mid-course  phase,  when  significant  motions  of 
the  weapons  platforms  will  take  place. 

Second,  the  diffusion  process  model  in  the  model  (1)  may  itself  be  an 
approximation  to  a  queuing  model  in  which  there  are  a  large  number  of 
elements,  i.e..  in  the  scaling  in  equation  (14)  with  N  the  total  number  of 
targets 'decoys  in  the  model  [31].  The  use  of  diffusion  approximations  to 
represent  large  populations  is  a  common  technique;  however,  we  are  not 
aware  of  any  studies  which  have  determined  that  this  would  be  an  effective 
class  of  models  for  any  phase  of  SDI  operations.  We  have  not  pursued  this 
point  in  this  project;  rather,  we  use  queuing  models  as  a  component  of  the 
numerical  analysis  of  the  system  model,  including  the  required  descriptions 
of  the  orbital  mechanics  and  target  features  such  as  aspect  angle. 

3.2  Monte  Carlo  and  Stochastic  Gradient  Methods 

From  the  previous  sections  we  have  seen  that  it  will  be  possible  to  compute 
the  opt.'mal  local  feedback  controls  which  are  the  engagement  strategies  only 
under  certain  limited  conditions.  There  may  be  circumstances  when  the 
information  available  for  the  design  and  execution  of  local  strategies  is  poor. 
In  this  case  we  may  have  an  a  prion  design  fnr  engagement  strategy, 
and  it  would  be  useful  to  have  a  method  for  evaluating  and  implementing 
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this  strategy.  One  technique  for  accomplishing  this  is  to  parameterize  the 
strategy,  and  optimize  the  parameter  in  an  “open  loop"  mode  using  a  Monte 
Carlo  technique. 

This  is  the  underlying  idea  of  the  stochastic  gradient  method  used  in  the 
theory  of  stochastic  approximations  [34,  42],  This  method  has  been  applied 
to  advantage  in  large  scale  planning  systems.  It  is  easy  to  implement;  it  is 
efficient;  and  it.  can  be  readily  adapted  to  treat  optimization  problems  with 
integer  valued  variables.  For  example,  it  has  been  shown  t.o  be  far  more  ef¬ 
fective  in  producing  network  designs  in  specific  instances  than  a  very  efficient 
simplex  algorithm  [13].  The  primary  use  for  this  class  of  algorithms  is  in  set¬ 
ting  up  a  prototype  weapons  allocation  system  which  can  be  systematically 
enhanced  and  upgraded  in  subsequent  phases  of  the  project. 

Once  again,  we  shall  explain  the  method  and  the  algorithms  it  implies  in 
terms  of  an  abstract  dynamical  optimization  problem.  (We  discuss  “static” 
problems  a  little  later.)  The  method  is  general  and  can  be  applied  to  almost 
any  class  of  optimization  problems.  See  [34]  for  examples. 

Consider  the  system 


min  E[  [  c(t,  x{t),u{t))  dt\  (15) 

«  Jo 

x{t)  =  b{t,  x(t),  u(t))  +  w(t) 

x  £  RN,u  e  Rm 

We  make  the  feedback  transformation 
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u(t)  -  S(t,x(t),  v{t)),  V  6  Rp  (16) 

with  the  function  5  :  (0,oo)  x  RN  x  Rp  — *  RM  given.  This  is  the  a  priori 
strategy.  Since  we  wish  to  compute  the  best  such  strategy  (parameterized 
by  time  functions  v{-))  in  “open  loop"  form,  we  approximate  the  probability 
law  P  of  the  noise  in  terms  of  the  empirical  distribution 

P  =  -£$*,(“' )  (1") 

r  j= i 

where  wj  are  trajectories  of  the  noise  obtained  by  a  random  generator  and  <5  is 
the  Dirac  delta  function.  Now  we  must  solve  the  deterministic  optimization 
problem 

xJ(t)  =  b(t ,  xJ(f),  S(t,  x (t),  v(t)))  +  w-’(t)  (18) 

min  -  ^2  f  c(t,  x*(t),  S{t,  x*(t),v(t)))  dt 

v  r  j=i  '/o 

where  tu-'(f)  is  a  particular  trajectory  of  the  noise.  This  is  a  deterministic 
optimization  problem,  which  we  can  solve  by  a  gradient  technique  or  the 
Pontryagin  minimum  principle. 

The  idea  of  the  stochastic  gradient  method  is  similar  to  this,  but  it  uses 
a  recursive  procedure  to  optimize  the  parameter  u(-)  of  the  strategy.  Let 
J(v)  be  the  integral  in  expression  (15).  Suppose  we  are  able  to  compute  the 
gradient  dJ/dv  by  an  adjoint  method  (numerically,  after  a  discretization  in 
which  v(-)  is  finite  dimensional).  Then  the  stochastic  gradient  algorithm  is 
the  recursive  procedure 

dJ 

vr+l  =  Pv[vr  -  pr—{vT,Ur)\,  pT  €  [0,  oo),  Vr  =  1,2,...  (19) 
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for  any  sequence  of  positive  numbers  {pr}  with  pr  =  oo,  and  T Zr  pi  <  00 ■ 
Here  uir  is  a  (numerically)  generated  realization  of  the  process  noise,  and  Pv 
is  projection  on  the  (finite  dimensional)  set  V  where  v(-)  takes  its  values. 

Under  convexity  conditions  it  is  possible  to  show  that  the  algorithm  con¬ 
verges  globally  [13].  Local  convergence  results  are  given  in  [34].  Under 
favorable,  but  not  unrealistic  smoothness  and  convexity  conditions,  it  can  be 
shown  that  the  convergence  rate  of  the  algorithm  is  optimal. 

There  is  an  additional  technical  point  which  must  be  addressed.  The 
SDI  interception  problem  requires  the  treatment  of  integer  valued  (random) 
variables.  It  is  clear  that  conventional  algorithms  for  integer  programming 
will  not  be  effective  for  this  problem.  It  is  possible  to  design  some  heuristic 
algorithms  based  on  the  stochastic  gradient  method  which  show  promise  for 
the  treatment  of  optimization  on  integer  valued  variables. 

Here  is  a  simple  modification  of  the  basic  stochastic  gradient  algorithm 
which  has  been  shown  by  [22]  to  work  well  for  problems  with  integer  valued 
variables.  Suppose  we  have  to  solve  the  problem  on  NM ,  the  set  of  A/-tupIes 
of  natural  numbers: 


min  [E fix)} 

xe/VMl  /j 


(20) 


Consider  the  following  algorithm 

Q  f 

Xn+1  =  Xrx  -  a^-{\xn],u>n),  a  6  (0,  00)  fixed 
Ox 

where  [xn]  is  the  integer  nearest  to  xn.  Evidently,  the  sequence  [xn] 


(21) 

cannot 
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converge.  Rather,  it  moves  on  some  recurrent  set  of  points.  Suppose  for 
simplicity  these  points  belong  to  the  hypercube  [0,1]M.  Let  p\  denote  the 
(relative)  visit  frequency  of  [zn],-  (the  ith  component)  to  the  point  1.  Let 
p°  be  the  visit  frequency  to  0.  Then  the  solution  [in]"  at  a  given  step  is 
determined  by  the  maximum  frequencies 

i  if  p}>p? 
o  if  p}<p° 

This  procedure  may  be  improved  slightly  by  ordering  the  elements  of  [xn] 
according  to  a  scheme  depending  on  the  visit  frequencies.  Effective  results  for 
the  network  planning  problem  were  obtained  [22]  for  this  algorithm  in  3  steps 
(ordering  the  elements)  and  3000  iterations,  requiring  2  minutes  of  computer 
time.  This  compares  favorably  with  the  stochastic  gradient  algorithm  for  a 
continuous  variable. 

Other,  more  complex  algorithms  for  treating  integer  valued,  stochastic 
approximation  problems  using  penalization  and  a  modification  of  the  prob¬ 
ability  law  are  given  in  [22],  However,  the  more  complex  algorithms  do  not 
improve  the  simple  one  (21)  (22)  in  any  significant  way. 

3.3  Index  Rules  and  Efficient  Implementations 

To  achieve  the  objective  of  designing  weapons  allocation  algorithms  which 
reflect  the  overall  structure  of  the  SDI  BM  operations  system,  it  will  be  neces¬ 
sary  to  provide  even  more  efficient  allocation  and  coordination  strategies.  In 
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this  subsection  we  shall  discuss  an  analytical  framework  for  this.  It  involves 
formulation  of  the  weapons  allocation  problem  as  a  scheduling  problem,  us¬ 
ing  a  particularly  compact  set  of  feedback  strategies  called  “index  rules”  to 
implement  effective  allocation  schedules. 

3.3.1  Stochastic  Scheduling  and  Single  Station  Weapons  Alloca¬ 
tion 

To  see  how  the  weapons  system  allocation  and  coordination  problem  may  be 
formulated  as  a  scheduling  problem,  consider  a  simplified  version:  one  station 
with  several  weapons  assets  indexed  by  j  =  1,2 , . . . , iV,  with  no  precedence 
relations  among  these  assets.  Suppose  that  each  threat's  dynamics  are  de¬ 
scribed  by  a  stationary  Markov  chain  [32].  When  a  threat  is  engaged  by 
resource  the  BM  system  (on  a  superior  level)  receives  an  immediate  re¬ 
ward  R(t )  =  Rj(xi(t))  and  its  state  changes  to  Xi(t  +  1)  according  to  its 
transition  rule.  The  states  of  the  threats  not  engaged  remain  unchanged.  In 
this  simplified  problem  we  can  think  of  the  “state”  of  the  threat  platform  as: 

{0,  threat  is  homing  on  target, 

(23) 

1,  threat  is  not  homing  on  target, 

The  reward  is  a  device  used  to  represent  the  instantaneous  significance 
of  each  threat  to  the  Battle  Manager  and  the  cost  of  using  weapon  system  j 
(e.g.,  probability  of  revealing  position).  It  summarizes  (in  a  very  simplified 
way)  strategic  doctrine  and  the  rationale  of  the  Battle  Manager.  In  this 
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simple  version  the  states  of  all  threats  are  observed  and  the  problem  is  to 
schedule  the  order  in  which  the  threats  are  engaged  to  maximize  the  expected 
present  value  of  the  sequence  of  immediate  rewards 


££«'*(()]  (24) 

i=  l 

where  0  <  a  <  1  is  a  fixed  discount  factor.  This  is  a  resource  allocation 
problem  [41]  called  the  multi-armed  bandit  problem  [52]. 

3.3.2  Index  Rules 

In  the  basic  version  of  the  multi-armed  bandit  problem  there  are  N  inde¬ 
pendent  resources  (machines).  Let  Xi(t)  be  the  state  of  resource  (machine) 
i  =  1,  2, .  .  .  ,  N  at  time  t  =  1, 2, ...  In  the  simplest  version  of  this  problem  at 
each  t  one  must  operate  exactly  one  machine.  If  machine  i  is  selected,  one 
gets  an  immediate  reward  R(t)  —  R(xi(t))  and  its  state  changes  to  Xi(t  -f  1) 
according  to  a  stationary  Markov  transition  rule;  the  states  of  the  idle  ma¬ 
chines  remain  frozen,  X{(t  +  1)  =  xj(t),j  i.  The  states  of  all  machines  are 
observed  and  the  problem  is  to  schedule  the  order  in  which  the  resources  are 
operated  to  maximize  the  expected  present  value  of  the  sequence  of  immedi¬ 
ate  rewards  (24). 

This  problem  was  first  formulated  in  the  1940’s.  The  essential  break¬ 
through  came  when  Gittins  and  Jones  [17]  showed  that  to  each  resource 
(machine)  i  is  attached  an  index  which  is  a  function  only  of  its  state,  and 
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that  the  optimal  policy  operates  the  resource  with  the  largest  current  index. 
This  “index  rule”  is  important  because  it  converts  the  original  N  dimensional 
problem  into  N  one  dimensional  ones.  The  index  was  subsequently  shown  to 
be  [16,  18] 


i 'i(xi)  =  max 


E[y.;=!  a'Riiximxi  =  Xi\ 


(25) 


r>1  a‘| *«  =  *»] 

where  the  maximization  is  over  all  stopping  times  r  >  1.  This  is  the  dynamic 
allocation  index  (DAI),  interpreted  as  the  maximum  expected  reward  per  unit 
of  discounted  time. 


A  direct,  formal  solution  and  interpretation  can  be  given  to  the  weapons 
system  allocation  and  coordination  problem  in  this  framework.  More  impor¬ 
tant  is  the  fact  that  indices  can  be  computed  efficiently  and  quickly  given 
models  for  the  threat  dynamics  (e.g.,  Markov  transition  models). 

In  our  approach  to  the  BM  weapons  allocation  and  coordination  problem 
for  the  single  platform  problem  each  resource  (machine)  i  =  1, 2, . . . ,  iV  is 
characterized  by  the  pair  of  sequences 


{X -(*),*>)},  <s  =  1,2,...  (26) 

A'!(s)  is  the  random  reward  obtained  when  i  is  operated  for  the  3th  time 
and  F'(s)  is  the  information  (a  <7-field)  about  machine  i  gathered  after  it 
has  been  operated  ( s  —  1)  times.  At  each  time  exactly  one  machine  must  be 
operated.  Thus,  t  =  tl  +  t2  +  t3  +  •  ■  ■  +  tn  where  t'  =  t'(t)  is  the  number  of 
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times  i  is  operated  during  1,2 The  decision  at  time  t  4-  1  is  based  on 
the  available  information 


F(t)  =  V,-F*(f'  4-  !),  t  =  1,2,...  (27) 


A  policy  7r  is  any  sequence  of  decisions  that  satisfies  this  information 
constraint.  The  problem  is  then  to  find  the  policy  n  that  maximizes 


V'(*)  =  £Ea'Xi<1>(f<''(f))|f(l)I  (28) 

t  =  l 

where  i(t)  is  the  machine  operated  at  time  t. 

In  this  general  situation  the  index  for  resource  (machine)  i  after  it  has 
been  operated  ($  —  1)  times  is 


i/;(s)  =  max 


Eirr  «*Jc*(om«)] 


*>•  ^e:_i«‘i^(«)j 


(29) 


where  the  maximization  is  over  all  stopping  times  s  <  r  <  oo  of  F’(-).  The 
index  rule  is  to  operate  the  machine  with  the  largest  index. 


Several  extensions  of  the  preceding  framework  are  necessary  to  capture 
a  realistic  weapons  engagement  scenario:  More  than  one  weapons  technique 
can  be  operated  at  a  time,  additional  constraints  due  to  precedence  rules  may 
appear,  the  size  of  the  problem  may  be  very  large  if  parametric  dependence 
is  to  be  investigated,  pre-emptive  strategies  must  be  considered. 
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3.4  Multiple  Station  Weapons  Coordination 


This  is  a  much  more  difficult  problem  than  the  single  platform  case.  It  is 
a  version  of  the  “multi-server”  scheduling  problem  which  encompasses  all 
Hie  difficulties  of  multi-agent  stochastic  control.  One  must  solve  a  large- 
scale  dynamic  programming  problem  which  is  intractable  in  general  cases. 
For  the  purposes  of  this  application  demonstration,  we  shall  simplify  the 
problem  by  adopting  a  specific,  suboptimal  form  for  the  solution.  The  pre¬ 
sumed  structure  is  a  two-layer  one  with  several  coordinators  (BM  stations) 
on  the  top  level  and  individual  agents  (computers)  controlling  separate  sta¬ 
tions/platforms,  each  equipped  with  one  or  more  weapons  on  the  lower  level. 
The  agents  respond  to  commands  from  designated  BM  stations.  Each  agent 
uses  a  “local  feedback  strategy,”  deploying  weaponss  in  response  to  his  per¬ 
ception  of  the  threat  (perhaps  as  defined  by  the  BM  station)  while  observing 
operational  and  tactical  constraints  imposed  by  the  BM  station.  The  indi¬ 
vidual  agents  optimize  their  performance  measures  using  an  index  rule,  as 
discussed  in  the  last  section.  The  BM  station  commands  the  agents’  activi¬ 
ties  under  his  command  by  imposing  operational  constraints  to  satisfy  global 
operational  objectives,  including: 

(i)  Concealing  the  positions  of  units; 

(ii)  Establishing  a  priority  for  threat  engagement; 

(iii)  Establishing  precedence  constraints  for  weapons  deployment;  and 
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(iv)  Coordinating  weapons  operations  with  other  tactical  operations  (EW 
deployment,  maneuver  control,  etc.) 

The  individual  agents  communicate  their  actions  to  their  BM  station, 
providing  the  instantaneous  state  of  each  weapon  system  (a  conditional  prob¬ 
ability),  the  index  values  of  each  weapon  the  agent  controls,  and  the  value 
of  the  agent’s  local  performance  measure.  They  may  also  communicate  their 
perception  of  the  threat  if  it  differs  from  that  provided  by  the  BM  station. 
The  BM  station  communicates  operational  constraints,  e.g.,  precedence  con¬ 
straints  on  weapons  deployment  among  agents,  threat  information,  and  per¬ 
formance  constraints  to  the  agents.  The  latter  are  the  Lagrange  multipliers, 
sometimes  called  “coordination  variables,”  in  systems  theory. 

This  formulation  limits  inter-agent  communications  by  channeling  them 
all  through  the  upper  level  BM  stations,  which  simplifies  the  problem  tremen¬ 
dously.  It  allows  different  kinds  of  agents  on  the  lower  level,  including  au¬ 
tomated  stations  and  subordinate  systems.  It  allows  each  agent  to  have 
different  information  about  the  threat,  derived  from  local  sensor  facilities. 
By  permitting  each  agent  to  use  a  locally  optimal  strategy  subject  only  to 
constraints  imposed  by  the  BM  station,  computation  of  a  near  optimal  strat¬ 
egy  is  reduced  to  a  manageable  level.  The  simplified  structure  also  makes  the 
development  of  simulation  scenarios  straightforward.  Permitting  agents  to 
exercise  local  control  strategies  as  discussed  in  earlier  sections,  enhances  the 
survivability  of  the  overall  system.  If  communications  to  a  command  center 
(coordinator)  are  lost,  the  agents  lose  constraint  specification  updates,  sensor 
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updates,  and  performance  multipliers  supplied  by  the  weapons  commander. 
However,  they  can  continue  to  function,  assuming  they  retain  some  access  to 
the  sensor  data,  by  optimizing  and  executing  their  local  control  actions. 

It  is  difficult  to  prescribe  a  precise  optimal  decision  making  structure  for 
reconfiguring  the  system  under  stress;  however,  one  can  provide  an  expert 
system,  e.g.,  a  production  rule  system,  which  would  both  assist  the  local  area 
coordination  of  weapons  platform  operations  and  guide  the  reconfiguration 
of  the  weapons  C2  system  in  times  when  some  units  or  command  centers  are 
off-line.  We  have  not  addressed  this  problem  in  this  project. 


3.5  Sensor  Scheduling 

In  this  section  we  consider  the  problem  of  scheduling  a  suite  of  sensors  for 
the  optimal  detection  of  targets.  The  sensor  scheduling  problem 6  involves  the 
simultaneous  selection  of  of  a  signal  processing  scheme  (according  to  some 
performance  measure)  together  with  the  subset  of  sensors  that  collect  the 
data.  The  scheduling  problem  and  the  model  on  which  it  is  based  serves  to 
illustrate  the  key  ideas  in  the  treatment  of  other  scheduling  problems  based 
on  models  using  stochastic  differential  equations.  As  such  it  is  a  “generic” 
example. 

Applications  of  this  concept  include  multiple  sensor  platforms  and  dis- 

°The  work  in  this  section  was  supported  in  part  by  the  Army  Research  Office  under 
contract  DAAG-39-83-C-0028.  The  results  discussed  in  this  section  are  based  on  [8,  7], 
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tributed  sensor  networks.  On  a  platform  with  multiple  sensors  there  is  a 
need  to  coordinate  the  data  obtained  from  the  different  sensors,  which  may 
include  radar,  infrared,  and  other  sensor  technologies.  The  data  obtained 
from  different  sensors  will  likely  be  of  varying  quality  (as  a  function  of  range, 
aspect,  ambient  noise,  etc.),  and  systematic  procedures  are  required  for  ap¬ 
portioning  confidence  to  different  data  sets  and  for  basing  decisions  on  the 
composite  data  set.  For  example,  radar  trackers  are  more  accurate  at  long 
range  than  are  infrared  trackers;  the  reverse  is  true  at  short  range. 

In  sensor  networks  one  needs  to  coordinate  data  collected  from  a  large 
number  of  sensors  distributed  over  a  large  geographical  area.  Conflicts  must 
be  resolved  and  a  preferred  set  of  sensors  selected  (on  a  given  time  interval) 
and  utilized  in  detection,  estimation,  and/or  control  decisions. 

Sensor  scheduling  should  be  carried  out  on  the  basis  of  optimizing  rea¬ 
sonably  defined  performance  measures.  These  should  include  not  only  terms 
allocating  penalties  for  errors  in  signal  processing  (detection  and/or  estima¬ 
tion);  but  also,  they  should  include  costs  for  managing  the  sensor  network  - 
e.g.,  costs  for  (de)activating  sensors,  and  for  switching  from  one  set  of  sensors 
to  another.  For  example,  activating  a  radar  sensor  on  a  platform  increases 
the  detectibility  of  the  platform,  and  this  should  be  accounted  as  a  switching 
cost.  Using  a  more  accurate  sensor  with  a  more  complete  data  output  may 
entail  higher  bandwidth  communications  and  the  allocation  of  more  compu¬ 
tational  power  to  that  sensor.  In  certain  networks  use  of  a  sensor  may  involve 
physical  movement  of  that  unit,  and  this  incurs  a  cost. 
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3.5.1  A  Model  Problem 

Consider  the  problem  of  estimating  the  signal  process  x(t)  6  Rn  based  on  a 
collection  of  measurements  { t/z(^ ),  i  =  1,2,...,  M}  from  sensors  indexed  by 
i  €  [1,...,A/].  (Each  y'(t)  can  be  vector-valued.)  Suppose  x(t)  is  defined 
by  the  diffusion  process 

dx(t)  =  f(x(t))dt  +  g(x(t))dw(t),  x(Q)  =  £,  0  <t<T  (30) 

with  values  x(t)  £  Rn ■  Suppose  the  measurements  satisfy 

dyi(t)  =  hi{x(t))dt  +  Rfdvi{t),  yi(0)  =  0,  i  =  (31) 

with  values  in  Rd' .  Here  «;(•),  v'(-)  are  independent,  standard,  Wiener  pro¬ 
cesses  in  Rn,  Rd‘ ,  respectively,  and  R{  ~  Rj  >  0  are  positive-definite,  d{  x  dt 
matrices. 

If  we  are  given  the  set  of  measurements  {y’(s),  s  <t,i  =  1, . . . ,  M},  then 
the  problem  of  (detecting)  estimating  x(t)  is  a  standard  problem  in  nonlinear 
filtering  theory  [39] .  Suppose,  in  contrast,  that  we  may  select  among  the 
various  signals  y'(-)  during  certain  intervals  of  time,  and  base  our  estimates 
of  x(t )  on  the  best  selection,  which  varies  as  a  function  of  time.  That  is,  we 
wish  to  determine  the  optimal  utilization  schedule  for  the  suite  of  sensors, 
based  on  “running  costs”  for  using  sensors  and  “switching  costs”  for  changing 
the  set  of  active  sensors. 

Let  Ci(x)  be  the  cost  of  using  sensor  i  when  the  state  of  the  signal  is  x, 
and  let  k„(x)  and  ki„(x )  be  the  respective  costs  of  turning  off  and  turning 
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on  the  ith  sensor.  The  signal  processing  objective  is  to  compute,  at  time  T, 
an  estimate  <p(T )  of  a  given  function  q>(x(T))  of  the  state.  It  is  natural  to 
use  the  least  squares  estimation  error  as  the  performance  measure  for  this 
function 

Estimation  Error  =  £’{|d>(;r(T'))  -  4>{T)) |2}  (32) 

Now  consider  the  problem  of  scheduling  the  sensors.  First,  it  is  necessary 
to  define  a  configuration  of  sensors.  Let  j\f  be  the  set  of  all  possible  sensor 
configurations.  An  element  v  of  J\f  is  an  iV/-tuple  of  l’s  and  0’s.  A  1  in 
position  j  means  that  the  jth  sensor  is  on,  a  0  means  that  the  sensor  is  off. 
There  are  ;V  =  2'Vf  elements  in  AJ .  A  sensor  schedule  is  a  piecewise  constant 
map  u(-)  :  [0,  T\  — ►  M.  Let  Tj  6  (0,  T]  be  the  switching  times  for  the  sensor 
schedule  u,  that  is,  the  time  instants  at  which  individual  sensors  are  turned 
on  or  off.  Let  u,  v'  be  the  sensor  configurations  before  and  after  a  switching. 
Then  the  cost  of  switching  is 

Ku>(x)  =  kio(x)  +  V  k„fix)  (33) 

{t'€y}{i^y'}  v'} 

The  total  running  cost  associated  with  a  configuration  v  £  M  is 

cv[x)  :=  ]T  cfix)  (34) 

In  ( 33  )(34)  the  symbol  {i  6  denotes  the  indices  of  the  entries  in  the  vector 
v  occupied  by  a  1;  i.e. ,  the  sensors  which  are  “on.”  The  symbol  {i  £  i/} 
denotes  the  set  of  indices  corresponding  to  sensors  that  are  “off.”  We  shall 
assume  that  the  running  and  switching  cost  functions  c,  k  are  bounded  and 
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continuous  as  functions  of  x.  Moreover,  we  shall  assume  that  the  switching 
costs  are  bounded  from  below  by  a  positive  constant. 


In  this  framework  a  sensor  scheduling  strategy  is  defined  by  an  increasing 
sequence  of  switching  times  Tj  £  [0,T]  and  the  corresponding  sequence  Uj  € 
Af  of  active  sensor  configurations.  Let 

u(t)=Vj,  t€[Tj,rj+ 1);  J  =  1,2,... 
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be  the  notation  for  a  strategy. 


We  are  interested  in  finding  the  optimal  sensor  scheduling  strategy  simul¬ 
taneously  determining  the  optimal  estimator  <j>  for  each  active  sensor  config¬ 
uration.  Given  a  strategy  and  the  associated  estimator  <f>,  the  corresponding 
cost  is 


J(u(-)A)  =  E{\<p(x(T))-j>(T)\2 
rT 

+  /  c(x(t),u(t))dt +  YJk{x(t),u{TJ_l),uiTj)) 
Jo 


(39) 


where  we  have  introduced  the  notation 


c(x,v)  =  c„(x),  x£Rn,v^M 
k(x  v,  v')  =  Ky(x),  x  €  Rn ,  v,  v  £  jV 


The  optimal  scheduling/estimation  problem  is  to  find  among  all  admissi¬ 
ble  scheduling  strategies  and  associated  estimators  the  pair  achieving 

inf_  <j>)  (40) 


3.5.2  A  Stochastic  Control  Formulation 

As  shown  in  [7,  8],  the  optimization  problem  can  be  reformulated  as  an 
optimal  stochastic  control  problem  with  “impulse  type”  controls.  To  simplify 
the  problem,  suppose  4>(x)  =  x.  Then  the  optimal  estimator  (for  any  sensor 
configuration)  is  the  conditional  mean 

4>{T)  =  Eu"{x(T)\Ft} 


35 


where  Eu^  *  is  conditional  expectation  with  respect  to  the  probability  distri¬ 
bution  induced  by  u(-)  and 

FT  =  a{y(t,u(.)),t<T} 


is  the  (iT-algebra  of)  measurements  available  from  the  scheduling  strategy 
over  the  observation  interval.  Let  p(u.t)  be  the  conditional  probability  mea¬ 
sure  of  x(t)  given  T'(  on  Rn .  Then  the  conditional  mean  as  a  best  estimate 
can  be  written  as 

^(D  =  $(/*(«, T))=  f  xdfi(u,T)  (41) 

J  Rn 

which  we  regard  as  a  vector-valued  functional  of  p(u,T). 

As  a  result  of  this  simple  transformation  the  scheduling/estimation  cost 
may  be  rewritten  as  a  function  of  the  scheduling  strategy  u(-)  alone 

»(•)  | 

OO 

+  Y,  k (■ x ( Ti )  -  u (' T: - 1 ) > u (' r> ) ) X Tj  < T 
i=i 

where  Xt,<t  is  the  characteristic  function  of  the  set  of  (random)  events  with 
Tj  <  T  ■ 


x(T)  -  $(/*(«,  T)  ||2  +  f  c(x(t),  u(t))dt  (42) 


Since  we  assumed  that  the  switching  costs  are  bounded  from  below,  if  the 
observation  interval  is  finite,  then  the  optimal  cost  will  be  finite,  and  there 
will  be  only  a  finite  number  of  switchings  among  the  sensor  configurations 
during  [0,7’].  Because  the  control  for  (42)  is  a  pure  switching  control,  we 
shall  follow  standard  terminology  and  call  it  an  impulsive  control  [9]. 
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The  optimization  problem  becomes  the  following  impulse  control  problem: 
Find  an  admissible  impulsive  control  u*{-)  such  that 

J(u(.))=  inf  J(u(-)) 

«*(•)€  0.1,1 

where  Uad  is  the  set  of  all  impulsive  control  laws  adapted  to  the  observations 

jzyi  ,u(  ) 

This  problem  falls  into  the  class  of  optimal  stochastic  (impulse)  control 
problems  with  partially  observations.  It  can  be  converted  into  a  problem 
with  complete  observations  by  introducting  an  evolution  equation  -  that  is, 
a  Zakai  equation  -  for  the  conditional  probability  distribution  of  x(t)  based 
on  the  observations. 

Let  p(t,u(-))  be  the  conditional  probability  measure 

p(u(.),  t)(^)  =  •“<’>} 

for  each  control  u(t).  Here 

£(<)  =  exp  h(x(s)1u{s))Tdz{s)  -  -  ||  ft(x(s),u(s))  [|2  dz{s) | 

defines  the  change  of  measure  in  the  Girsanov  transformation 

dPu{\ 
dP  ~ 

so  that  under  the  probability  measure  Pu^  the  process 

v(t)  =  z{t)  —  [  h(x(s),u(s))ds 
Jo 
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is  a  standard  Wiener  process.1  The  function  h  is  the  vector  in  (36)  with  each 

_  I. 

element  multiplied  by  R{  2 ,  i  =  1, .  . ,  M . 


For  each  control  u(-),  p(u{-),  t){<j>)  is  the  unnormalized  conditional  prob¬ 
ability  measure  of  x(t)  given  the  observations  •7ry(',u(').  This  function  is  the 
“state  vector”  in  the  sensor  scheduling  problem.  It  satisfies 

dp{u(-),t)  =  L*p(u(-),  t)dt  +  S{-,u{t))Tdy{t,u{-))  (43) 


pM’M)  =  Po 

where  y((t,  u(-))  is  the  control  (schedule)  dependent  observations  process  and 

RT'h'Wx  m(1) 


6(x,v)  := 


R>  h'(x)X 


R-MlhM{x)X{u}{M) 


Thus,  the  infinite  dimensional  quantity  p((u(-),  •)  becomes  the  state  vector 
in  the  fully  observed  version  of  the  problem.  Using  p  we  can  write  the 
estimation  cost  functional  as 


J(u(.))  =  E  j*(p(u(-),r)  +  jT  <  p(u(.),t ), C(u(t))  >  dt  (44) 


oo 

+  I]Xri<T  <  p{u{-),Tt),K(Ui- !,Ut)  > 

t=X 


7The  process  x(-)  retains  its  probability  law  under  due  to  the  independence  of 

the  noises  and  the  initial  conditions. 
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where 


C(ui)  =  cUt,  u,  e  {1,2, ,  iV} 

K{ui,Uj)  =  Ui,Uj  €  {1,2, . . . ,  N} 

and  $  is  a  functional  defined  on  measures8 

Eu{ ||  x(T)  -  *(/*(«,  T))  ||2}  =  E{*(p(u(-),T))} 

To  summarize,  this  formulation  converts  the  optimal  sensor  scheduling 
problem  based  on  partial  information  -  the  noisy  measurements  y  -  which  has 
a  finite  dimensional  state  space,  into  a  problem  with  full  state  observations, 
but  an  infinite  dimensional  state  space. 

3.5.3  Solution  of  the  Optimization  Problem 

A  solution  to  the  optimal  sensor  scheduling  problem,  specifically,  a  set  of 
variational  inequalities  defining  the  transitions  in  the  sensor  configuration 
and  the  switching  times,  can  be  derived  from  a  dynamic  programming  argu¬ 
ment.  Let  u(t)  —  j  be  a  fixed  sensor  schedule,  and  let  pj  be  the  corresponding 
density  p(-,y).  Then 

dpj  =  L*pjdt  +  pj(hj)Tdz(t)  (45) 

Pj(0)  =  ir,  j  €  {1,2,...,  AT} 

8In  fact,  'P(m)  =  m(x2)  II  Mx)  l|3  /^(l)  where  x2(®)  =||  *  ||2,z  €  Rn,  and  n  is  any 
finite  measure  on  R 11  such  that  £r(x)>M(x2)  are  defined. 
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Let  pjiir  denote  the  solution  to  (45).  Set 


*i(0(^)(’T)  =  WPi.-(0)}  (46) 

Then  is  a  semigroup  (because  pj(t)  is  a  Markov  process)  which  we  shall 
use  to  define  the  evolution  of  the  cost  in  the  scheduling  process. 

To  simpilfy  the  presentation,  consider  the  case  N  —  2.  Let 

Ci-CUr),  *  =  1,2 

Kx  :  =  A'(l,2) 

K2:=K(2,1) 

and  let  Ci(tt)  =<  C j , vr  >  with  the  other  quantities  similarly  defined. 

Now  consider  the  set  of  functionals  Ux(n,  t),  {72(tt,  0  such  that 


Ui(ir,t)>  0,  U2(w,t)>0 

Ux(*,T)  =  U2(*,T)  =  V(k) 


Ux(*,t)  <  $1  (s  -  t)Ui(n,s)  + 

£  $x(A  -  t)Cx(v)dX 

(47) 

U2{n,t)  <  $2(s  -  t)U2{-K,S )  + 

£  *2(A  -  t)C2{n)d\ 

(48) 

Vs  >  t 


and 


Ui{ir,t)  <  Kx(ir)  +  U2(ir,t)  (49) 

U2(t,  t )  <  K2(ir)  +  U\ ( 7T ,  t)  (50) 
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These  expressions  have  the  following  interpretation: 

17, (x,  0)  =  inf  J[u(.)],  i  =  l,2  (51) 

*!«)=• 

P(0)  =  7T 

is  the  optimal  sensor  scheduling  and  estimation  performance  for  the  system 
starting  at  time  zero  with  the  initial  configuration  indicated.  Suppose  we 
start  with  u(0)  =  1,  then  so  long  as  (49)  holds  with  strict  inequality  we 
should  use  schedule  j  =  1,  since  the  optimal  performance  in  this  configu¬ 
ration  is  less  than  the  cost  of  switching  to  configuration  j  —  2  and  then 
continuing  optimally  thereafter  -  defined  by  the  right  side  in  (49).  The  opti¬ 
mal  performance  during  the  period  prior  to  a  switch  is  determined  by  (47), 
which  holds  with  equality  prior  to  the  switch.  The  latter  is  the  equation  of 
dynamic  programming  which  governs  the  choice  of  the  optimal  esimation  law 
while  sensor  configuration  j  —  1  is  being  used. 

At  any  time  t  when  condition  (49)  holds  with  equality,  then  it  is  optimal  to 
switch  from  configuration  j  =  1  to  j  =  2,  and  continue  optimally  thereafter. 
The  sensor  schedule  is  determined  by  the  sequence  of  switching  times.  For 
example,  suppose  i  =  1  in  (51)  and  let 

rx*  =  inf  {Ui{pi{t),t)  =  A'x(pi(t))  +  U2{pi{t),t)}  (52) 

Then  Tj  is  the  optimal  time  to  switch  from  configuration  j  =  1  to  j  —  2.  Let 

p*(t)=pi(t),  t  £  [0,7-;]. 

Next  define 

r2  =  rjn{T{U2(P2(t),t)  =  K2(p2(t))  +  Ui(p2(t),t)}  (53) 
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Then  K  is  the  optimal  time  to  switch  back  from  configuration  j  =  2  to 

j  1-  — 1 

p'{t)=p2(t),  i  <E  K\t2*]. 

In  this  way  the  sequence  of  optimal  switching  times  is  constructed. 

In  the  general  case  when  there  are  jV  >  1  sensor  configurations,  then  the 
computation  of  the  switching  times  is  based  on  the  inequality 

Ui(x,t)<  min  +  Uj{Tr,  t)} 

j*' 

The  system  (47)-(50),  appropriately  modified  constitutes  a  set  of  quasi- 
variational  inequalities  defining  the  optimal  sensor  scheduling  problem. 

3.5.4  Implementation  of  the  Algorithm 

The  numerical  treatment  of  systems  of  Quasi-Variational  Inequalities  has 
only  just  recently  been  attempted.  The  basic  ideas  are  not  substantially 
different  from  the  treatment  of  the  nonlinear  partial  differential  equations 
-  the  Hamilton-Jacobi  equation  -  of  dynamic  programming.  There  is  one 
substantial  difficulity,  however.  That  is  that  the  boundary  of  the  domain 
on  which  the  solution  is  defined  -  the  optimal  continuation  policy  between 
switchings  depends  on  the  solution.  Specifically,  the  switching  set  is  defined 
by  the  solution  to  the  continuation  condition. 

The  numerical  treatment  of  optimal  scheduling  conditions  was  beyond 
the  scope  of  this  effort. 
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4  Preliminary  Considerations:  Decision  Sup¬ 
port  Systems  for  Weapons  Management 


The  framework  we  have  defined  for  the  problem  of  managing  weapons  re¬ 
sources  deployed  from  different  platforms,  including  a  coordinator  and  the 
exchange  of  summary  information  by  the  local  agents  through  the  coordina¬ 
tor.  effectively  supports  “heuristic  optimization”  stategies.  This  is  important, 
since  there  is  no  hope  of  solving  the  multi-station  coordination  problem  using 
conventional  analytical  methods. 


4.1  Constraint  Directed  Reasoning 

Consider,  for  example,  the  method  of  constraint  directed  search  developed  by 
M.  Fox  [14,  15]  (see  also  [40]).  The  case  study  treated  by  Fox  is  job  shop 
scheduling  which  involves  the  selection  of  a  set  of  operations  whose  execution 
leads  to  the  completion  of  an  order;  and  the  assignment  of  start  and  finish 
times  and  resources  to  each  operation.  The  number  of  possible  schedules 
grows  exponentially  with  the  number  of  orders,  alternative  production  plans, 
the  number  of  substitutable  resources,  and  other  parameters  of  the  system. 
By  fully  integrating  the  constraints  into  the  search/scheduling  process  it 
is  possible  to  bound  the  generation  and  focus  the  selection  of  aternative 
solutions.  In  effect,  this  treats  the  job  shop  scheduling  task  by  “constraint- 
directed  reasoning”  [14]. 
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Fox  defines  a  four  “level”  procedure  for  constraint-directed  scheduling  of 
orders: 

Levtl  1  Selects  an  order  to  be  completed  based  on  prioritization  rules,  its 
category,  and  due  date. 

Level  3  Does  a  capacity  analysis  of  the  plant  to  determine  the  earliest  start 
time  and  the  latest  finish  time  for  each  operation  associated  with  the 
order.  This  determines  time  binding  costraints  which  are  effective  at 
the  next  level. 

Level  3  Does  a  detailed  scheduling  of  all  resources  necessary  to  produce  the 
order.  A  “beam  search”  method  is  used  to  select  the  schedule,  based  on 
a  pre-search  analysis  examining  the  constraints  associated  with  the  or¬ 
der  (determining  the  direction  of  the  search),  including  a  determination 
of  whether  any  new  constraints  should  be  generated.  Level  3  outputs 
reservation  time  bounds  for  each  resource  required  for  the  operations 
in  the  chosen  schedule. 

Level  If  Selects  the  actual  reservations  for  the  resources  which  minimize  the 
“work-in-process”  time. 

It  is  easy  to  draw  certain  parallels  between  this  approach  to  selection 
of  resource  allocation  schedules,  and  aspects  of  the  BM  weapons  system  C2 
problem.  For  example,  the  interaction  of  weapons  deployed  by  the  same  plat¬ 
form  and  by  neighboring  platforms  must  be  coordinated  to  avoid  counter 
productive  interference  effects,  as  we  have  noted.  This  requires  observing 
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causality  and  precedence  relationships  (among  other  variables).  Constraint 
directed  search  procedures  may  be  useful  at  some  level  in  the  C2  system  for 
the  delineation  of  options  (which  satisfy  all  the  constraints).  The  hierarchi¬ 
cal  structure  of  the  algorithm  is  suggestive  designing  a  “semi-automated” 
weapons  management  decision  support  system.  For  example,  the  kinds  of 
tasks  done  on  Levels  1  -  3  in  Fox’s  system  could  be  automated.  The  deploy¬ 
ment  decisions  made  on  Level  4  would  be  the  responsibility  of  the  BM  system 
based  on  the  constraint  information  output  by  the  algorithms  at  Level  3. 

The  system  developed  by  Fox  for  job  shop  scheduling  was  not  intended 
for  applications  like  management  and  scheduling  of  weapons/ EW  resources 
and  engagements.  For  example,  it  has  no  facilities  to  describe  the  adversary 
nature  of  SDI  encounters  and  the  attendant  need  to  secure  operations;  it  does 
not  provide  for  continued  service  under  stressed  conditions  (loss  of  units);  it 
has  no  provision  for  evaluation  and  fusion  of  sensor  data;  etc.  In  addition  it 
treats  orders  as  isolated  events.  In  SDI  operations  it  is  necessary  to  “track” 
the  evolution  of  threats  using  a  dynamic  model  of  threats  based  on  sensor 
information.  Based  this,  SDI  engagements  involve  dynamic  allocation  prob¬ 
lems,  as  we  have  argued.  The  methodology  of  Fox  has  no  (apparent)  means 
for  accomodating  dynamical  relationships  among  arriving  orders.  We  have 
discussed  Fox’s  work  here  here  as  an  example  of  some  of  the  good  work  now 
under  way  in  AI  applied  to  resource  allocation  problems  and  to  illustrate 
the  way  heuristic  methods  developed  in  one  AI  application  can  be  used  to 
suggest  treatments  in  other  contexts. 
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4.2  Other  Related  Work 


Other  interesting  work  includes  the  BATTLE  fire  control  system  developed 
at  NRL  by  Slagle  and  Hamburger  [46],  and  the  work  on  applications  of  AI 
to  C’3I  reported  in  [4,  6,  11].  The  work  in  [12]  is  especially  relevant  to  this 
project.  Our  treatment  of  the  scheduling  algorithm  is  more  sophisticated 
than  the  branch  and  bound  technique  used  in  [12]  (and  the  beam  search 
method  used  in  [14]  for  that  matter).  However,  the  modeling  methodologies 
based  on  “flavors’’  used  in  [12]  are  very  interesting. 

From  a  more  general  point  of  view,  the  problem  of  coordinating  BM 
operations  over  an  extended  theater  should  be  addressed  using  “Distributed 
Artificial  Intelligence”  (DAI)  methods.  Some  preliminary  work  on  this,  and 
additional  references  may  be  found  in  the  report  [1].  It  is  our  judgment 
that  the  theory  and  methodology  of  DAI  techniques  is  not  well  developed, 
and  that  an  application  of  these  techniques  to  tactical  battle  management 
including  weapons  operations  is  premature  at  this  time.  However,  in  the  long 
run  this  may  be  an  important  area  to  pursue. 
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